Picture for Kai-Wei Chang

Kai-Wei Chang

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Add code
Feb 02, 2026
Viaarxiv icon

AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering

Add code
Jan 21, 2026
Viaarxiv icon

On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation

Add code
Jan 09, 2026
Viaarxiv icon

GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation

Add code
Dec 18, 2025
Figure 1 for GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
Figure 2 for GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
Figure 3 for GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
Figure 4 for GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
Viaarxiv icon

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Add code
Dec 14, 2025
Figure 1 for MotionEdit: Benchmarking and Learning Motion-Centric Image Editing
Figure 2 for MotionEdit: Benchmarking and Learning Motion-Centric Image Editing
Figure 3 for MotionEdit: Benchmarking and Learning Motion-Centric Image Editing
Figure 4 for MotionEdit: Benchmarking and Learning Motion-Centric Image Editing
Viaarxiv icon

From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs

Add code
Nov 18, 2025
Figure 1 for From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs
Figure 2 for From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs
Figure 3 for From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs
Figure 4 for From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs
Viaarxiv icon

Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English

Add code
Nov 13, 2025
Figure 1 for Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English
Figure 2 for Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English
Figure 3 for Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English
Figure 4 for Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English
Viaarxiv icon

LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

Add code
Oct 16, 2025
Figure 1 for LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
Figure 2 for LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
Figure 3 for LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
Figure 4 for LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
Viaarxiv icon

DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation

Add code
Oct 16, 2025
Figure 1 for DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
Figure 2 for DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
Figure 3 for DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
Figure 4 for DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
Viaarxiv icon

Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner

Add code
Oct 09, 2025
Viaarxiv icon